Lexicon Acquisition with and for Symbolic NLP-Systems – a Bootstrapping Approach
نویسندگان
چکیده
We present a method of applying a broad-coverage LFG grammar of German in the process of semi-automatic lexicon acquisition from corpora. The identification of corpus instances that illustrate a certain subcategorization frame uniquely is done by a comparison of the numbers of analyses the grammar assigns to the corpus instances, under the assumption of different hypothetical lexicon entries for the candidate verb. Filtering conditions expressed on the feature representation output by the grammar further restrict the sentences that the automatic extraction step is based on. Experiments show that the grammar-based method produces better results than a method based on patterns in a corpus query language.
منابع مشابه
Syntactic Category Learning as Iterative Prototype-Driven Clustering
We lay out a model for minimally supervised syntactic category acquisition which combines psychologically plausible concepts from standard NLP part-of-speech tagging applications with simple cognitively motivated distributional statistics. The model assumes a small set of seed words (Haghighi and Klein, 2006), an approach with motivation in (Pinker, 1984)’s semantic bootstrapping hypothesis, an...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملMultilingual Computational Semantic Lexicons in Action: The WYSINNWYG Approach to NLP
Much effort has been put into computational lexicons over the years, and most systems give much room to (lexical) semantic data. However, in these systems, the effort put on the study and representation of lexical items to express the underlying continuum existing in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has remained embryonic. A sense enumeration approach fai...
متن کامل03. Tools and Procedures for the Acquisition of Morphological and Syntactic Information from Corpora
Over the past decades, the importance of the lexicon has increased in both natural language processing (NLP) and linguistic theory. Within NLP, much of the early research focused on isolated ‘toy’ tasks, treating the lexicon as a peripheral component. These days, the focus is on constructing systems suitable for the treatment of large, naturally occurring texts, and therefore rich lexical resou...
متن کاملGenerating Multiwords from MEDLINE in the SPECIALIST Lexicon
Multiwords are vital to better NLP systems for more effective and efficient parsers, refining information retrieval searches, enhancing precision and recall in NLP applications, etc. The Lexical Systems Group (LSG) enhanced the coverage of multiwords in the Lexicon to provide a more comprehensive resource. This paper describes a new systematic approach to lexical multiword acquisition from MEDL...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998